Efficient Data Mining: Scripting and Scalable Parallel Algorithms

نویسندگان

  • Peter Christen
  • Markus Hegland
  • Ole M. Nielsen
  • Stephen Roberts
  • Peter Strazdins
  • Tatiana Semenova
  • Irfan Altas
چکیده

This paper presents our approach to data mining that allows the coupling of parallel applications with a scripting language resulting in an efficient and flexible toolbox. Parallel algorithms which are scalable both in data size and number of processors are a key issue to be able to solve the ever increasing problems in data mining. On the other hand, data mining applications should be flexible to allow interactive data exploration. By using a toolbox written in a scripting language we are able to steer parallel applications in a flexible way, thus fulfilling the needs of a data miner for fast interactive data analysis. The chosen approach is discussed and first results are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Data Mining for Rules

Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...

متن کامل

Big Data Frequent Pattern Mining

Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. In...

متن کامل

A New Approach for Frequent Itemset Data Mining in Hadoop Environment

Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. In...

متن کامل

Parallel tree-projection-based sequence mining algorithms

Discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of mined patterns demand efficient, scalable, and parallel algorithms. Even though a number of algorithms have been developed to efficiently parallelize frequent pattern discovery algorithms that are based on the...

متن کامل

Parallel Classification on SMP Systems

This paper presents fast scalable decision-tree-based classification algorithms targeting shared-memory systems. The algorithms are based on the sequential SPRINT classifier and span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This is extended with task pipelining and dynamic load balancing to yield more efficient schemes. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001